System for presentation time stamp recovery from a transcoder

ABSTRACT

A method for transcoding a digital video stream that includes transcoding using a transcoder a video stream that includes presentation time stamps for the video stream together with an audio stream that includes presentation time stamps for the audio stream in a manner that modifies the presentation time stamps for the video stream in a manner such that a plurality of first values for presentation time stamps for a first set of video frames of the video stream are modified to a plurality of second values for presentation time stamps for the second set of video frames. The audio stream includes embedded first values for presentation time stamps in a first location. The method includes determining an offset of the second values of the second set of presentation time stamps of the transcoded video stream based upon the first values of the set of presentation time stamps embedded in the audio stream from the transcoder. The method includes combining the transcoded video stream and an associated audio stream based upon the offset. Preferably, the transcoder also modifies the audio time stamps. Preferably, the audio stream includes the embedded first values. Preferably, the offset is determined by taking the diff between the audio PTS in the PES and the embedded PTS in the audio packet (private data or embedded in the audio frame).

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 63/016,496 filed Apr. 28, 2020, the completecontents of which is incorporated herein by reference.

BACKGROUND

The subject matter of this application relates to a system forpresentation time stamp recovery from a transcoder.

A video transcoding technique using a video transcoder is a process ofconverting a digital video signal having an initial set ofcharacteristics into another digital video signal having a modified setof characteristics. For example, the modified characteristics of theresulting transcoded digital video signal may have, for example, adifferent bit rate, a different video frame rate, a different videoframe size, a different color characteristic, a different set of videocoding parameters, a different lossy video compression technique, and/ora different lossless coding of the video signal.

In many applications, such as a cable broadcast system, afull-resolution master video file is stored as a mezzanine file that isa compressed video file that when rendered is generally visuallyindistinguishable from a rendering of the full-resolution master videofile. The mezzanine file format may be any suitable format, such as forexample, an MXF file format or a MOV file format. The mezzanine filestored in a mezzanine file format is often modified to another fileformat when it is streamed to another device, such as a H.264 videostream, a H.265 video stream, a FLV video stream, an MPEG-1 videostream, an MPEG-2 video stream, an MPEG-4 video stream, a VC-1 videostream, a WMV video stream, a TAPE video stream, a Pores video stream, aDNxHD video stream, or a Cineform video stream.

Often the modified file format is provided from a video distributionserver that transcodes the compressed video stream, or the originalcoded mezzanine file, to a format suitable for distribution to aparticular user or group of users. For example, a programmer for abroadcast distribution system may transcode the video stream to a formatand/or a bit rate suitable for being distributed by a satellitetransmission system to one or more users or groups of users that have asatellite receiver. For example, a headend system for a cabledistribution system may transcode the video stream to a format and/or abit rate suitable for being distributed by an integrated cablemanagement termination system to one or more users or groups of users.For example, a video distribution server may transcode the video streamto a format and/or a bit rate suitable for being distributed through theInternet to one or more users or groups of users.

In some embodiments, as disparate video compression standards haveproliferated, such as H.261, H.263, H.264, MPEG-1, MPEG-2, MPEG-4, etc.,the demand for convertibility of video streams from one digital videocompression type of video streams to another digital video compressiontype and/or bitrate has steadily increased. In an embodiment ofproviding a source video stream to a plurality of users, each of whichis using a different channel having different capabilities, the videostream is transcoded to a digital video format and/or a bitrate suitablefor the particular user. By way of example, a video conferencing systemoften transmits a plurality of video streams where many of the videostreams are transmitted with different respective bit rates overdifferent data channels.

One exemplary transcoder may include a decoder, a transmission port, andan output of an encoder. The decoder may operate in synchronization witha time stamp of an encoder as follows. The encoder includes a mainoscillator, which serves as a system time clock (STC), and a counter.The STC belongs to a predetermined program and is a main clock of aprogram for video and audio encoders.

The time stamps are used for time synchronization of differentcomponents with one another. When a video frame or audio block is inputto an encoder, the encoder samples the STC from the video frame or theaudio frame. A constant indicating a delay between the encoder and thedecoder buffer is added to the sampled STC, thereby forming apresentation time stamp (PTS). The PTS is inserted in a header of thevideo frame or the audio frame.

In the case of reordering video frames, decode time stamps (DTSs), whichindicate when each of the video frames is to be decoded by the decoder,are respectively inserted into the video frames. The DTSs, which areused for a frame reordering process, can be the same values as theirrespective PTSs including I, P, and unreferenced B pictures, and theDTSs and their respective PTSs may be different for I, P, and referencedB pictures. Whenever DTSs are used, PTSs are used.

According to the Advanced Television Systems Committee (ATSC) standard,a PTS or a DTS are inserted into a header of each picture. The encoderbuffer outputs transport packets each having a time stamp called programclock reference (PCR) or packetized elementary streams (PES) each havinga time stamp called a system clock reference (SCR). The PCR is generatedat intervals of 100 msec for MPEG and 40 msec for ATSC, and the SCR isgenerated at intervals of up to 700 msec. The PCR or SCR is used tosynchronize a STC of the decoder with an STC of the encoder.

A program stream (PS) has an SCR as its clock reference, and a transportstream (TS) has a PCR as its clock reference. Therefore, each type ofvideo stream or audio stream has a time stamp corresponding to a STC soas to synchronize the STC of the decoder with the STC of the encoder.

The MPEG based stream includes time information, such as a PCR or SCR,which is used for synchronizing an encoder with a decoder, an STC, and aPTS and a DTS, which are used for synchronizing audio content with videocontent. The MPEG stream is reconstructed using the decoder, and thetime information is discarded after being used to synchronize thedecoder with the encoder and to synchronize the audio content with thevideo content. Unfortunately, in some situations the time stamps aremodified in a non-predetermined manner.

What is desired, therefore, are improved systems and methods foreffective time stamp management from a transcoder.

BRIEF DESCRIPTION OF THE DRAWINGS

For a better understanding of the invention, and to show how the samemay be carried into effect, reference will now be made, by way ofexample, to the accompanying drawings, in which:

FIG. 1 illustrates a transcoding system.

FIG. 2A illustrates an MPEG video Packetized Elementary Stream (PES).

FIG. 2B illustrates an MPEG transport stream.

FIG. 3 illustrates a video transcoder and an audio transcoder.

FIG. 4 illustrates an exemplary transcoding system with PTS recovery.

FIG. 5 illustrates a set of I, B, and P pictures.

FIG. 6 illustrates the ordering of a set of I, B, and P pictures.

FIG. 7 illustrates an exemplary transcoding system with PTS recovery anddejitter.

FIG. 8 illustrates a 3:2 pulldown technique.

FIG. 9 illustrates expanded presentation time stamps for two videosegments.

DETAILED DESCRIPTION

Referring to FIG. 1, an exemplary transcoding system 90 is illustrated.The transcoding system 90 includes a transcoder 100. The transcoder 100may include a timing synchronizer 110, a decoder 120, and an encoder130. The transcoding system 90 may further include a demultiplexer 140and a multiplexer 150.

The demultiplexer 140 receives an input transport stream (TS) or aninput program stream (PS), extracts timing parameters from the input TSor PS, and transmits the extracted timing parameters to the timingsynchronizer 110. The demultiplexer 140 extracts video data, that waspreviously compressed in a predetermined manner, from the input TS or PSand transmits the extracted video data to the decoder 120. In many videocoding techniques, the timing parameters include a presentation timestamp (PTS), a decode time stamp (DTS), and a program clock reference(PCR).

The presentation time stamp is a timestamp metadata field in a MPEGtransport stream, and other transport streams, that is used to achievesynchronization of the program's separate elementary streams (e.g.,video stream, audio stream, subtitle stream, etc.) when presented to theviewer. The presentation time stamp is typically given in units relatedto a program's overall clock reference, such as a program clockreference (PCR) or a system clock reference (SCR), which is alsotransmitted in the transport stream or program stream.

The presentation time stamps typically have a resolution of 90 kHz,suitable for the presentation synchronization task. The PCR or SCRtypically has a resolution of 27 MHz which is suitable forsynchronization of a decoder's overall clock with that of the remotedecoder

A transport stream may contain multiple programs and each program mayhave its own time base. The time bases of different programs within atransport stream may be different. Because PTSs apply to the decoding ofindividual elementary streams, they reside in the PES packet layer ofboth the transport streams and the program streams. End-to-endsynchronization occurs when encoders save time stamps at capture time,when the time stamps propagate with associated coded data to decoders,and when decoders use those time stamps to schedule presentations.

Synchronization of a decoding system with a channel is achieved throughthe use of the SCR in the program stream and by its analog, the PCR, inthe transport stream. The SCR and PCR are time stamps encoding thetiming of the bit stream itself and are derived from the same time baseused for the audio and video PTS values from the same program. Sinceeach program may have its own time base, there are separate PCR fieldsfor each program in a transport stream containing multiple programs. Insome cases, it may be possible for programs to share PCR fields.

The timing synchronizer 110 keeps the timing parameters received fromthe demultiplexer 140 intact so that they still can be synchronized withsegmentation metadata even after the video data has undergone atranscoding process, and transmits the timing parameters to the encoder130 and the multiplexer 150. The decoder 120 restores the compressedvideo data received from the demultiplexer 140 to a video sequence usinga predetermined decoding method and provides the video sequence to theencoder 130. The encoder 130 compresses the video sequence received fromthe decoder 120 according to predetermined conditions set by atranscoding parameter controller 160, records the timing parametersreceived from the timing synchronizer 110 in the compressed videosequence, and transmits the resultant compressed video sequence to themultiplexer 150. The transcoding parameter controller 160 may beconfigurable based upon user input, such as from a GUI, systemdetermined, and/or based upon a particular application. The transcodingparameter controller 160 determines transcoding conditions suitable foran end user environment and provides the determined transcodingconditions to the encoder 130 and the timing synchronizer 110. Thetranscoding conditions include, for example, a video quality, a videoresolution, a bit rate, and a video frame rate. The multiplexer 150multiplexes the video sequence received from the encoder 130 creating anoutput TS or PS. The multiplexer 150 records the timing parametersreceived from the timing synchronizer 110 in a header of the output TSor PS. The segmentation metadata may have been extracted from the inputTS or PS by the demultiplexer 140 or may have been provided by anothermetadata provider. Other transcoders may likewise be used, as desired.

Merely for matters of convenience, the discussion will be describedusing a TS rather than a PS as an example. For example, in the followingparagraphs, only a PCR will be described as a reference time indicator,but a SCR may also be used as the reference time indicator in a casewhere a stream input to or output from the transcoding system 200 is aPS. Even if a PCR is input to the transcoding system 200 as a referencetime indicator, a SCR may be output from the transcoding system 200 asthe reference time indicator, and vice versa.

Referring to FIG. 2A and to FIG. 2B, an MPEG-2 packetized elementarystream (PES) packet and an MPEG-2 TS, respectively, are illustrated.Referring to FIG. 2A, an MPEG-2 video stream, which is compressed usingan encoder, is packetized into PES packets. Each of the PES packetsincludes an optional PES header and a PES packet data field. Theoptional PES header 200 includes an optional field 210. The optionalfield 210 includes a PTS field 220 and a DTS field 230. The PTSinformation is recorded in the PTS field 220, and the DTS information isrecorded in the DTS field 230.

Referring to FIG. 2B, a TS, which is formed through a multiplexingprocess, is 188 bytes long and includes a header 240 and a payload 250.A PES packet or a program association table (PAT), or a program maptable (PMT) are contained in the payload 250. The header 240, whichstarts with sync bytes, includes various fields, such as an adaptationfield 260. The adaptation field 260 includes an optional field 270, andthe optional field 270 includes a PCR field 280. PCR information, whichis reference time information, is recorded in the PCR field 280.

Referring to FIG. 3, in some cases it is desirable to use a particularvideo transcoder 300 to transcode a video stream 330 of a digital videostream (e.g., packetized elementary stream) 320, and a separate audiotranscoder 310 to transcode an audio stream 340 of the digital videostream (e.g., packetized elementary stream) 320. The video transcoder300 typically also includes the capability of processing a correspondingaudio stream and maintaining the synchronization between the receivedvideo stream and audio stream. For example, the video transcoder 300 mayhave desirable transcoding characteristics, such as a very high qualityvideo encoding with relatively low bit rates. For example, the videotranscoder 300 may have other desirable transcoding characteristics,such as a software application that is suitable to operate on a commonoff the shelf server (e.g., a server in a public and/or a private datacenter). For example, the audio transcoder 310 may have desirabletranscoding characteristics, such as a reduced bitrate or a reducedsampling rate. In this manner, the video stream from a digital videostream may be provided to the video transcoder 300 and the audio streamfrom the digital video stream may be provided to the audio transcoder310. The output of the video transcoder 300 and the output of the audiotranscoder 310 may be combined 350, such as into a packetized elementarystream 360. Unfortunately, as a result of the video transcoding processthe video transcoder tends to modify the values of the presentation timestamps in some manner, such that the presentation time stamps associatedwith a set of video frames are different than the presentation timestamps associated with the same set of video frames resulting from thevideo transcoder (e.g., such as the relative difference between therespective presentation time stamps is modified). It is problematic toautomatically resynchronize the audio stream, as originally providedwith the digital video stream or as transcoded by the audio transcoder,because the synchronization has been lost between the audio stream andthe video stream because the presentation time stamps of the videostream have been modified in an unknown manner. If the audio stream isnot properly aligned with the video stream, then the audio content willnot match the corresponding video content. It is noted that the PTSs ofthe video stream and the corresponding PTSs of the audio stream arenormally different numbers, but the known difference between thedistinct PTS numbers (e.g., video stream and audio stream) provides thesynchronization.

Referring to FIG. 4, it was determined that since the video transcoder400 tends to modify the presentation time stamps encoded together withthe video stream from that provided at its input to modifiedpresentation time stamps provided from its output, it is desirable torecover the original presentation time stamps provided to the input ofthe video transcoder 400 in some manner. Typically the video transcoder400 includes the capability of processing an input audio stream 402 andan input video stream 404 (such as being provided separately orotherwise encoded together as a packetized elementary stream), whilemaintaining a synchronization of the video stream 404 and thecorresponding audio stream 402 at its output thereof. With theunderstanding that the synchronization of the video stream 404 and thecorresponding audio stream 402 is maintained by the video transcoder 400during the transcoding process, it was determined that the presentationtime stamps of the input video stream 404 may be encoded within theinput audio stream 402, in a suitable manner, that passes through thevideo transcoder 400 unchanged. With the presentation time stamps of theinput video stream 404 embedded within the audio stream 402, such thatthe presentation time stamps remain unchanged as a result of thetranscoding process, may be used to resynchronize the video stream 404with an original audio stream 414 of a digital video stream 410 thatincludes both a video stream 412 and the audio stream 414. The videostream 412 is provided to the video transcoder 400 as the input videostream 404.

A video PTS extraction process 420 may process the video stream 412 ofthe digital video stream 410 to extract the presentation time stampsassociated with the video stream 412. A canned audio stream 430 isprovided by the system to a presentation time stamp embedding process432 that also receives the extracted presentation time stamps associatedwith the video stream 412. The extracted presentation time stamps fromthe video stream 412 are embedded within the canned audio stream 430 bythe presentation time stamp embedding process 432. The presentation timestamp embedding process 432 embeds the presentation time stamps withinthe canned audio stream 430 in a manner that remains unchanged as aresult of the transcoding process of the video transcoder 400 andprovides the input audio stream 402. The input audio stream 402 issynchronized with the input video stream 404 (such as being providedseparately or otherwise encoded together as a packetized elementarystream) and is provided to the video transcoder 400 using a multiplexer403. The video transcoder 400 provides an output transport stream 440that includes both a transcoded output video stream 442 and an outputaudio stream 444. The output audio stream 444 includes the embeddedpresentation time stamps that remain unchanged as a result of thetranscoding process.

For example, the presentation time stamps embedded within the separateaudio stream 430 may be encoded within a private data portion of theencoded data stream. The private data portion may include, for example,one or more of the following, (1) a transport stream packet table 2-2;(2) a transport stream adaptation field table 2-6; (3) a packetizedelementary stream packet table 2-17; (4) a packetized elementary streampacket header; (5) a packetized elementary stream packet data bytefield; (6) a descriptor within a program stream and/or a transportstream; and (7) a private section table 2-30.

A presentation time stamp offset determination process 450 receives thetranscoded output video stream 442 and the output audio stream 444 andextracts the presentation time stamps from the output video stream andextracts the embedded presentation time stamps and the normalpresentation time stamps from the output audio stream 444. In thismanner, three different presentation time stamps may be extracted fromthe data obtained from the video transcoder 400. The comparison of thepresentation time stamps from the canned audio stream 404 with thepresentation time stamps that were embedded within the input audiostream 402 provides an offset 452 between the two presentation timestamps which corresponds to the offset between the time stamps of thetranscoded output video stream 442 and the video stream 412. The offset452 is added to presentation time stamps of the transcoded output videostream 442 by an PTS offset adjustment process 454, to provide atranscoded video stream with adjusted presentation time stamps 460. Theoffset 452 may also be used to adjust the program clock references ofthe transcoded output video stream 442. The offset 452 may also be usedto adjust the decode time stamps of the transcoded output video stream442. The output audio stream 444 after extracting time stamps may bediscarded, if desired.

An audio transcoder 470, if included, is used to transcode the audiostream 414 of the digital video stream (e.g., packetized elementarystream) 410. The output of the audio transcoder 470 may be combined 472with the transcoded video stream with adjusted presentation time stamps460, such as into a packetized elementary stream 474. Also, the audiostream 414 may pass-through 471 (which may include a buffer) to thecombiner 472.

In another embodiment, the canned audio stream 430 may be replaced bythe audio stream 414, where the presentation time stamps from the videostream 412 are embedded therein in a manner that are not modified as aresult of the transcoding process by the video transcoder 400. The audiostream from the transcoding process of the video transcoder 400 may bediscarded after extracting time stamps, if desired.

In another embodiment, the transcoding video process may include a 3:2pulldown technique, so that there is not a one-to-one match between thevideo frame into the video transcoder and the video frames out of thevideo transcoder. The 3:2 pulldown technique converts 24 frames persecond into 29.97 (or 30) frames per second. In general, this results inconverting every 4 frames into 5 frames plus a slight slowdown in speedwhen converting 24 frames per second into 29.97.

Preferably, the input video stream is not modified to include thepresentation time stamps in the private data sections for the videotranscoder, to reduce the likelihood of introducing errors. Also,potentially there may not be available space to include the presentationtime stamps in the input video stream. Moreover, in some cases thetranscoder may drop the private data field when it generates the outputPES header. Alternatively, the input video stream may be modified toinclude the presentation time stamps in a private data section that isnot modified by the video transcoder.

Preferably, the input audio stream is not modified to include thepresentation time stamps in the private data sections for the videotranscoder, to reduce the likelihood of introducing errors. Moreover, insome cases the transcoder may drop the private data field when itgenerates the output PES header. Also, potentially there may not beavailable space to include the presentation time stamps in the audiostream of the digital video stream.

Referring to FIG. 5, many encoding schemas for a video stream includeintra frames (i.e., I frames) that are pictures compressed based uponinformation only within the frame. The video stream may also includepredicted frame (i.e., P frames) that are pictures predicted at least inpart from previous I or P frames. The video stream may also includebi-directional predicted frames (i.e., B frames) that use past andfuture I and P frames for motion compensation. Depending on the type ofencoding schema used, other types of frames may be used.

Referring to FIG. 6, in order for the decoder to reconstruct a B-framefrom the preceding I and following P frames, both of the I and thefollowing P frames should arrive before the B-frame. Accordingly, theorder of frame transmission is different than the order they appear whenpresented. The use of the decode time stamps, which inform the decoderwhen to decode the frames, and the presentation time stamps, whichinform the decoder when to render the frames accommodate for the timingof the decoding and rendering of the frames.

The Advanced Television Systems Committee (ATSC) for MPEG-2 provides fora resolution of 1920×1080 progressive video has a frame rate of 23.976,24, 29.97, or 30 frames per second. ATSC provides for MPEG-2 aresolution of 1920×1080 interlaced video has a frame rate of 29.97frames (59.94 fields), or 30 frames (60 fields) per second. ATSCprovides for MPEG-2 a resolution of 1280×720 progressive video has aframe rate of 23.976, 24, 29.97, 30, 59.94, or 60 frames per second.ATSC provides for MPEG-2 a resolution of 704/858×480 progressive video(SMPTE259M) has a frame rate of 23.976, 24, 29.97, 30, 59.94, or 60frames per second. ATSC provides for MPEG-2 a resolution of 704/858×480interlaced video (SMPTE259M) has a frame rate of 29.97 frames (59.94fields) or 30 frames (60 fields) per second. ATSC provides for MPEG-2 aresolution of 640×480 progressive video has a frame rate of 23.976, 24,29.97, 30, 59.94, or 60 frames per second. ATSC provides for MPEG-2 aresolution of 640×480 interlaced video has a frame rate of 29.97 frames(59.94 fields) or 30 frames (60 fields) per second. ATSC also supportsother PAL frame rates and resolutions and supports the H.264 video codecwith other frame rates and resolutions.

By way of example, for MPEG-2 with a resolution of 1920×1080 progressivevideo with a frame rate of 29.97 frames per second, the presentationtime stamps are incremented by 3003 (with a 90 kHz clock resolution)between frames when properly incremented. In a similar manner, for H.264with a resolution of 1920×1080 interlaced video with a field rate of59.94 fields per second and field coded pictures, the presentation timestamps are incremented by a sequence of 1501/1502/1501/1502/ . . . (witha 90 kHz clock resolution) between fields when properly incremented. Itis noted that 1501+1502 (for two sequential fields) is 3003 which is theframe rate. Accordingly, the presentation time stamps should beincremented between frames or fields in a uniform and consistent manner.

The video transcoder, which modifies the presentation time stampsassociated with particular video frames of the video content between itsinput and its output, has a tendency to create modified presentationtime stamps that are offset by 1. This process of variability in thepresentation time offsets from the preferred values tends to continueover time. Many presentation devices and associated decoders will tendto decode and render the frames in a suitable manner, even with jitterin the values of the presentation time stamps. Unfortunately, somepresentation devices and associated decoders may tend to improperlydecode and render the frames in a suitable manner when sufficient jitterexists in the values of the presentation time stamps. Moreover, sincethe decode time stamps are often the same as the presentation timestamps for I and P frames, and appropriately modified for B-frames, thedecode time stamps will likewise include jitter in the values if thepresentation time stamps include jitter in the values. The videotranscoder may introduce jitter into the presentation time stamps and/orthe determination of the offset (previously described) may introducejitter into the presentation time stamps. In either case, it isdesirable to reduce the amount of jitter in the presentation timestamps, including the decode time stamps, to decrease the likelihood ofthe failure of the decoding and/or presentation of the video content.

Referring to FIG. 7, it was determined that since the video transcoder700 has a tendency to modify the presentation time stamps encodedtogether with the video stream from that provided at its input tomodified presentation time stamps provided from its output, in a mannerthat includes jitter in its values, it is desirable to remove orotherwise reduce the amount of jitter in the resulting presentation timestamps. Typically the video transcoder 700 includes the capability ofprocessing an input audio stream 702 from a canned audio stream 707 thatuses the most recent video PTS for the audio PES PTS header and puttingthe most recent video PTS in the audio frame (or private data), and aninput video stream 704 using a multiplexer 703 (such as being providedseparately or otherwise encoded together as a packetized elementarystream), while maintaining a synchronization of the video stream 704 andthe corresponding audio stream 702 at its output thereof.

A digital video stream 710 includes both a video stream 712 and an audiostream 714. The video stream 712 of the digital video stream 710 isprovided to the video transcoder 700 as the input video stream 704. Theaudio stream 714 of the digital video stream 710 may be provided to thevideo transcoder 700 as the input audio stream 702.

As a result of the video transcoder modifying the presentation timestamps, it is desirable to read the presentation time stamps from thevideo stream 712 that is being provided to the video transcoder 700 by avideo PTS and DTS extraction process 720. Since the decode time stampsare also modified by the video transcoder 700, it is also desirable toread the decode time stamps from the video stream 712 that is beingprovided to the video transcoder 700 by the video PTS and DTS extractionprocess 720. The presentation time stamps and the decode time stamps fora temporal time period are stored in a table 730. The table 730preferably includes a defined temporal window of time for which data isretained, such as 10 seconds. In this manner, as new presentation timestamps and decode time stamps are added to the table 730 the olderpresentation time stamps and decode time stamps are removed from thetable 730.

The input audio stream 702 is synchronized with the video stream 704(such as being provided separately or otherwise encoded together as apacketized elementary stream) and is provided to the video transcoder700. The video transcoder 700 provides an output transport stream 740that includes both a transcoded output video stream 742 and an outputaudio stream 744. In general, the jitter adjustment may be as follows(described in more detail below). For the output audio stream 744 itincludes PES headers that includes both the original PTS and thejittered PTS from the video transcoder 700, with a difference being anoffset. The offset is subtracted (or added depending on the manner ofcomputing the difference), from the video PTS/DTS/PCR. The outputPTS+offset corresponds to the input PTS in the table except for jitter.The system determines the closest input PTS that matches the outputPTS+offset. Note that the system adds the offset to the PCR as well suchthat the PTS/DTS and PCR are all adjusted to the same offset.

A presentation time stamp jitter determination process 750 receives thetranscoded output video stream 742 and output audio stream 744, andextracts the jittered presentation time stamps from the PES header ofboth audio and video as well as the original video PTS embedded in theaudio stream. For the output audio stream 744 it includes PES headersthat includes both the original PTS and the jittered PTS from the videotranscoder 700, with a difference being an offset. The output videoPTS+offset corresponds to the input video PTS in the table except forjitter. The presentation time stamp jitter determination process 750compares the video PTS+offset presentation time stamps against theextracted presentation time stamps included in the table of PTSs andDTSs 730. Based upon matching between the video time stamps computedusing the offset generated from the output audio stream 744 and theextracted presentation time stamps included in the table of PTSs andDTSs 730, the presentation time stamp jitter determination process 750determines the closest matching presentation time stamp from the table730. A time stamp update process 760 modifies the presentation timestamp in the transcoded output video stream 742 to be the matchingpresentation time stamp from the table 730 identified by thepresentation time stamp jitter determination process 750.

The presentation time stamp jitter determination process 750 may alsoretrieve a matching decode time stamp from the table 730 based upon thematching presentation time stamp. The time stamp update process 760 mayalso modifies the decode time stamp in the transcoded output videostream 742 to be the matching decode time stamp from the table 730identified by the presentation time stamp jitter determination process750.

An audio transcoder 770, if included, is used to transcode the audiostream 714 of the digital video stream (e.g., packetized elementarystream) 710. The output of the audio transcoder 770 may be combined 772with the transcoded video stream with adjusted presentation time stampsand decode time stamps 762, such as into a packetized elementary stream474. Also, the audio stream 714 may pass-through 771 (which may includea buffer) to the combiner 772.

As previously discussed, the transcoded video stream from the videotranscoder 700 has a tendency to include some jitter, especially in thecase when the video frames from the input and output do not have aone-to-one correlation. The lack of one-to-one correlation primarilyoccurs in the situation where the video transcoding modifies the fieldrate and/or frame rate of the video stream.

As previously mentioned, one of the frame rate conversions is the 3:2pulldown technique that converts 24 frames per second into 29.97 (or 30)frames per second. Referring to FIG. 8, a set of frames 800representative of film at 24 frames/second and a set of fields 810representative of video at 30 frames/second (60 fields/second) areillustrated. Typically, a first frame 802 is transferred to three fields812 of the set of fields 810. Typically, a second frame 804 istransferred to two fields 814 of the set of fields 810. Typically, athird frame 806 is transferred to three fields 816 of the set of fields810. Typically, a fourth frame 808 is transferred to two fields 814 ofthe set of fields 810. In this manner, the process may be repeated forthe video transcoder 700. It is also noted that the fields used for theframes alternate their field selection. For example, frame 802 includesfield 1/field 2/field 1, while frame 806 includes field 2/field 1/field2. For example, frame 804 includes field 2/field 1, while frame 808includes field 1/field 2.

In the case of video content at 23.98 frames/second the presentationtime stamps should have a difference of 3754/3754/3754/3753 (with a 90kHz clock resolution) between frames when properly incremented. As aresult of the 3:2 pulldown process the fields 810 should havepresentation time stamps that are offset based upon the presentationtime stamp of each frame 800. For example, the frame 802 should resultin 3 fields 812, and accordingly the presentation time stamps of the 3fields 812 should be offset by 1502/1501/1502. For example, the frame804 should result in 2 fields 814, and accordingly the presentation timestamps of the 2 fields 814 should be offset by 1501/1502. In addition tothe likelihood of jitter from the video transcoder 700 for thepresentation time stamps of the fields matching those of the frames in aone-to-one manner, there is also a likelihood of jitter for thepresentation time stamps in the remaining fields of the conversionprocess that does not match those of the frames in a one-to-one manner.

To accommodate for the possibility of jitter in the fields that are notmatching that of the frames, such as a result of the 3:2 pulldowntechnique, the table 700 may further be expanded to create additionalpresentation time stamps for the frames 800. For example for frame 802,the second field 2 and the third field 1 may be provided a correspondingpresentation time stamp in the table 700, such as the presentation timestamp for frame 802 incremented by 1501 and incremented by 1501+1502,respectively. For example for frame 806, the second field 1 and thethird field 2 may be provided a corresponding presentation time stamp inthe table 700, such as the presentation time stamp for frame 806incremented by 1502 and incremented by 1502+1501, respectively. Forexample, for frame 804, the second field 1 may be provided acorresponding presentation time stamp in the table 700, such as thepresentation time stamp for frame 804 incremented by 1501. For example,for frame 808, the second field 2 may be provided a correspondingpresentation time stamp in the table 700, such as the presentation timestamp for frame 808 incremented by 1502. In a similar manner, toaccommodate for the possibility of jitter in the fields that are notmatching that of the frames, the table 700 may further be expanded tocreate additional decode time stamps for the frames 800.

The presentation time stamp jitter determination process 750 mayretrieve a matching presentation time stamp from the expanded table 730.The time stamp update process 760 may also modify the presentation timestamp in the transcoded output video stream 742 to be the matchingpresentation time stamp from the expanded table 730 identified by thepresentation time stamp jitter determination process 750. Thepresentation time stamp jitter determination process 750 may alsoretrieve a matching decode time stamp from the expanded table 730 basedupon the matching presentation or decode time stamp. The time stampupdate process 760 may also modifies the decode time stamp in thetranscoded output video stream 742 to be the matching decode time stampfrom the expanded table 730 identified by the presentation time stampjitter determination process 750. In this manner, the presentation timestamps and the decode time stamps may be updated accordingly to reducejitter, even though a corresponding frame was not present in the sourcevideo content.

Often, the video stream includes multiple video clips that are streamedtogether in a serial fashion with one another. As a result of havingmultiple video clips that are streamed together, the presentation timestamps between respective video clips normally includes a discontinuity.This discontinuity in the presentation time stamps also occurs when avideo clip wraps around its end in a serial presentation.

The video transcoder 700 unfortunately often processes the input videostream in a manner where any discontinuity in the presentation timestamps, typically associated with different video segments, results in adiscontinuity of the presentation time stamps in the transcoded videostream not being aligned with the discontinuity in the presentation timestamps of the input video stream. Accordingly, the presentation timestamps of the transcoded video stream for a first video segment may besequentially extended into a portion of a second video segmenttemporally after the first video segment. Accordingly, the presentationtime stamps of the transcoded video stream for the second video segmentmay be sequentially extended into a portion of the first video segmenttemporally prior to the second video segment.

Unfortunately, when attempting to modify the resulting video stream toaccount for jitter and modifying the resulting video stream to accountfor offsets in the presentation time stamps, it may be difficult toaccurately determine the proper location of the discontinuity based uponthe presentation time stamps of the input video stream. Moreover, if thepresentation time stamps from the video transcoder appear to be inerror, often the frames associated with the presentation time stamps arediscarded as being in error. Further, when attempting an advertisementinsertion process into the transcoded video stream, it is problematic toinsert the advertisement in the discontinuity between the segments sincethe discontinuity in the presentation time stamps does not necessarilymatch the discontinuity in the video frames.

To accommodate for the possibility of presentation time stamps notsuitably matching up in an area of a discontinuity, the table 700 mayfurther be expanded to create additional presentation time stamps forthe frames 800 proximate those areas of a discontinuity in the series ofthe presentation time stamps. A discontinuity in the presentation timestamps may be determined based upon the anticipated sequence ofincrements in the presentation time stamps being substantially differentthan what is expected, such as a difference of greater than 5%.

Referring to FIG. 9, the table 700 is expanded to include a series ofadditional presentation time stamps. A first video segment 900 mayinclude an associated set of video frames and presentation time stamps910. A second video segment 920 may include an associated set of videoframes and presentation time stamps 930. A discontinuity 940 exists inbetween the first video segment 900 and the second video segment 920which also is expressed as a discontinuity in the presentation timestamps of the first video segment 900 and the second video segment 920.When a discontinuity is identified in the presentation time stamps, suchas by a sequence of the presentation time stamps including a sufficientdiscontinuity, the table 700 is expanded with additional presentationtime stamps. A first expanded series of presentation time stamps 950 isdetermined virtually extending (in a forward manner) the presentationtime stamps of the first video segment 900, while there is no actualcorresponding video frames for the first expanded series of presentationtime stamps 950. A second expanded series of presentation time stamps960 is determined virtually extending (in a backward manner) thepresentation time stamps of the second video segment 920, while there isno actual corresponding video frames for the second expanded series ofpresentation time stamps 960. The result is a set of presentation timestamps 970 for the first video segment 900 and a set of presentationtime stamps 980 for the second video segment 920. By way of example, thefirst expanded series of presentation time stamps 950 and the secondexpanded series of presentation time stamps 960 may be 1 second induration.

With the expanded table 700 to include the additional presentation timestamps, these presentation time stamps may be used with the jitterreduction process and/or with the delta presentation time stampdetermination process for accurate adjustments.

The offset process, the jitter process, and/or the discontinuity may becombined with one another, as desired. In addition, the table may be inany format or manner, inclusive of any data structure or otherwise,stored in memory or a storage device.

Moreover, each functional block or various features in each of theaforementioned embodiments may be implemented or executed by acircuitry, which is typically an integrated circuit or a plurality ofintegrated circuits. The circuitry designed to execute the functionsdescribed in the present specification may comprise a general-purposeprocessor, a digital signal processor (DSP), an application specific orgeneral application integrated circuit (ASIC), a field programmable gatearray (FPGA), or other programmable logic devices, discrete gates ortransistor logic, or a discrete hardware component, or a combinationthereof. The general-purpose processor may be a microprocessor, oralternatively, the processor may be a conventional processor, acontroller, a microcontroller or a state machine. The general-purposeprocessor or each circuit described above may be configured by a digitalcircuit or may be configured by an analogue circuit. Further, when atechnology of making into an integrated circuit superseding integratedcircuits at the present time appears due to advancement of asemiconductor technology, the integrated circuit by this technology isalso able to be used.

It will be appreciated that the invention is not restricted to theparticular embodiment that has been described, and that variations maybe made therein without departing from the scope of the invention asdefined in the appended claims, as interpreted in accordance withprinciples of prevailing law, including the doctrine of equivalents orany other principle that enlarges the enforceable scope of a claimbeyond its literal scope. Unless the context indicates otherwise, areference in a claim to the number of instances of an element, be it areference to one instance or more than one instance, requires at leastthe stated number of instances of the element but is not intended toexclude from the scope of the claim a structure or method having moreinstances of that element than stated. The word “comprise” or aderivative thereof, when used in a claim, is used in a nonexclusivesense that is not intended to exclude the presence of other elements orsteps in a claimed structure or method.

I/We claim:
 1. A method for transcoding a digital video streamcomprising: (a) receiving a digital video stream that includes an inputvideo stream and an input audio stream; (b) extracting a first set ofpresentation time stamps from said input video stream; (c) embeddingsaid first set of presentation time stamps into a first audio stream ina first location; (d) providing said input video stream together withsaid first audio stream to a transcoder in a synchronized manner witheach other; (e) transcoding by said transcoder said input video streamincluding said first set of presentation time stamps from an initial setof characteristics to a modified set of characteristics including asecond set of presentation time stamps that are different from saidfirst set of presentation time stamps, and providing said transcodedinput video stream and said first audio stream from said transcoder in asynchronized manner with each other; (f) determining an offset of saidsecond set of presentation time stamps of said transcoded video streambased upon said first set of presentation time stamps embedded in saidtranscoding audio stream from said transcoder; (g) combining saidtranscoded video stream and said input audio stream based upon saidoffset.
 2. The method of claim 1 wherein said input video streamincludes video frames and said input audio stream includes audio frames,where said input video stream and said input audio stream are receivedas an input packetized elementary stream.
 3. The method of claim 1wherein said first location includes at least one of (1) a transportstream packet table 2-2; (2) a transport stream adaptation field table2-6; (3) a packetized elementary stream packet table 2-17; (4) apacketized elementary stream packet header; (5) a packetized elementarystream packet data byte field; (6) a descriptor within a program stream;(7) a descriptor within a transport stream; and (8) a private sectiontable 2-30.
 4. The method of claim 1 wherein said first audio streamincludes said input audio stream.
 5. The method of claim 1 wherein saidfirst audio stream does not include said input audio stream.
 6. Themethod of claim 1 wherein said first audio stream is free from beingtranscoded by said transcoder.
 7. The method of claim 1 wherein saidfirst audio stream is transcoded by said transcoder.
 8. The method ofclaim 1 wherein said transcoded video stream includes video frames andsaid first audio stream includes audio frames, where said transcodedvideo stream and said first audio stream are provided as an outputpacketized elementary stream.
 9. The method of claim 1 wherein saidcombining said transcoded video stream and said input audio stream basedupon said offset is a packetized elementary stream.
 10. The method ofclaim 1 wherein said input audio stream is transcoded by said an audiotranscoder.
 11. The method of claim 10 wherein said transcoded videostream includes video frames and said transcoded audio stream includesaudio frames, where said transcoded video stream and said transcodedaudio stream are provided as an output packetized elementary stream. 12.A method for transcoding a digital video stream comprising: (a)transcoding using a transcoder a video stream that includes presentationtime stamps for said video stream together with an audio stream thatincludes presentation time stamps for said audio stream in a manner thatmodifies said presentation time stamps for said video stream in a mannersuch that a plurality of first values for presentation time stamps for afirst set of video frames of said video stream are modified to aplurality of second values for presentation time stamps for said firstset of video frames, where said audio stream includes embedded saidfirst values for presentation time stamps in a first location; (b)determining an offset of said second values of said second set ofpresentation time stamps of said transcoded video stream based upon saidfirst values of said set of presentation time stamps embedded in saidaudio stream from said transcoder; (c) combining said transcoded videostream and an associated audio stream based upon said offset.
 13. Themethod of claim 12 wherein said video stream includes video frames andsaid audio stream includes audio frames, where said video stream andsaid audio stream are received as an input packetized elementary streamby said transcoder.
 14. The method of claim 12 wherein said firstlocation includes at least one of (1) a transport stream packet table2-2; (2) a transport stream adaptation field table 2-6; (3) a packetizedelementary stream packet table 2-17; (4) a packetized elementary streampacket header; (5) a packetized elementary stream packet data bytefield; (6) a descriptor within a program stream; (7) a descriptor withina transport stream; and (8) a private section table 2-30.
 15. The methodof claim 12 wherein said audio stream includes audio corresponding tocontent of said video stream.
 16. The method of claim 12 wherein saidaudio stream is free from including audio corresponding to content ofsaid video stream.
 17. The method of claim 12 wherein said audio streamis transcoded by said transcoder.
 18. The method of claim 12 whereinsaid combining said transcoded video stream and said associated audiostream is based upon said offset is a packetized elementary stream. 19.A method for transcoding a digital video stream comprising: (a)transcoding using a transcoder a video stream that includes presentationtime stamps for said video stream together with an audio stream thatincludes presentation time stamps for said audio stream in a manner thatmodifies said presentation time stamps for said video stream in a mannersuch that a plurality of first values for presentation time stamps for afirst set of video frames of said video stream are modified to aplurality of second values for presentation time stamps for said firstset of video frames, where said audio stream includes embedded saidfirst values for presentation time stamps in a first location; (b)determining an offset of said second values of said second set ofpresentation time stamps of said transcoded video stream based upon saidfirst values of said set of presentation time stamps embedded in saidaudio stream from said transcoder.
 20. The method of claim 19 furthercomprising modifying said plurality of second values for presentationtime stamps for said first set of video frames of said transcoded videostream based upon said offset.